A Syntax-based Rule-base for Textual Entailment and a Semantic Truth Value Annotator

نویسندگان

  • Amnon Lotan
  • Roni Katzir
چکیده

Textual entailment (TE) is the semantic inference task which takes two text fragments, and determines whether one entails the other. It captures the semantic inferences needed by many text understanding applications. Practical TE applications usually adopt relatively low-level lexical or lexicalsyntactic representations of text, which correspond closely to language structure. In many cases, such approaches miss out on some of the valuable and more abstract information, at the generic-syntactic and semantic levels. This thesis first presents a novel comprehensive generic syntax-based rule-base for the TE task, which offers a wide variety of over 60 generic entailment rules that can substitute between copious equivalent or entailing constructions in categories such as: active vs. passive, coordination, apposition, determiners, possessives, and case correction. Also, they can extract simplified IS-A and HAS-A implications from over a dozen generic patterns, and successfully decouples relative clauses. Based on inputs from previous salient works on the topic, as well as novel contributions, the rule-base is represented according to the Stanford Dependencies standard, is based on a well defined formalism, and is made publicly available for use with TE systems, including full documentation, and tools for rule design and software compilation. The rules are honed for high resilience to syntactic structure diversity, especially in RTE texts, which are used in the field’s most common benchmark. Qualitative and quantitative depth evaluations are reported, including a manual dataset analysis, that demonstrate the high potential for knowledge resources of this type in TE, the wide coverage this rule-base has over the required set of syntax-based transformations in this setting, its high accuracy, and how it significantly improves the performance of a concrete entailment engine. Additionally, I show that much work is needed for such systems to make better use of syntax-based resources. I then introduce TruthTeller, a novel algorithm and system that takes the syntactic parse tree of a given sentence, and identifies the semantic truth value of each predicate and clause. In contrast with previous work, it is the first such system meant to serve the natural language inference research community as a an open source tool, enriching conventional syntax tree representation. Some possible uses for these annotations are: inferring parts of a sentence from the whole, improving similarity (and contradiction) measures between texts, and improving the accuracy of entailment rule matching. As a side product, TruthTeller also annotates negation, and a classification of predicate implication types (factive, implicative, etc.). The download package also includes a lexicon of predicates and their implication types, the largest of its kind and the first to be made publicly available. Both the lexicon and TruthTeller’s annotations are shown to have good accuracy, recall and precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Annotation for Textual Entailment Recognition

We introduce a new semantic annotation scheme for the Recognizing Textual Entailment (RTE) dataset as well as a manually annotated dataset that uses this scheme. The scheme addresses three types of modification that license entailment patterns: restrictive, appositive and conjunctive, with a formal semantic specification of these patterns’ contribution for establishing entailment. These inferen...

متن کامل

Recognizing Textual Entailment Using Description Logic and Semantic Relatedness

Recognizing Textual Entailment using Description Logic and Semantic Relatedness Reda Siblini, Ph.D. Concordia University, 2014 Textual entailment (TE) is a relation that holds between two pieces of text where one reading the first piece can conclude that the second is most likely true. Accurate approaches for textual entailment can be beneficial to various natural language processing (NLP) appl...

متن کامل

Survey in Textual Entailment

Variability of semantic expression is a fundamental phenomenon of a natural language where same meaning can be expressed by different texts. Natural Language Processing applications like Question Answering, Summarization, Information Retrieval systems etc. often demand a generic framework to capture major semantic inferences in order to deal with the challenges created by this phenomenon. Textu...

متن کامل

KitAi: Textual Entailment Recognition System for NTCIR-10 RITE2

This paper describes Japanese textual entailment recognition systems for NTCIR-10 RITE2. The tasks that we participated in are the Japanese BC subtask and the ExamBC subtask. Our methods are based on some machine learning techniques with surface level, syntax and semantic features. We use two ontologies, the Japanese WordNet and Nihongo-Goi-Taikei, and Hierarchical Directed Acyclic Graph (HDAG)...

متن کامل

Abductive Reasoning with a Large Knowledge Base for Discourse Processing

This paper presents a discourse processing framework based on weighted abduction. We elaborate on ideas described in Hobbs et al. (1993) and implement the abductive inference procedure in a system called Mini-TACITUS. Particular attention is paid to constructing a large and reliable knowledge base for supporting inferences. For this purpose we exploit such lexical-semantic resources as WordNet ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012